---
title: 'Monitor NVIDIA GPUs using OpenTelemetry'
sidebarTitle: 'NVIDIA GPUs'
---

<Frame>
  <img src="/images/docs-nvidia-banner.gif" />
</Frame>

OpenLIT uses OpenTelemetry to help you monitor NVIDIA GPUs. This includes tracking GPU metrics like utilization, temperature, memory usage and power consumption.


## Get Started 

<CardGroup cols={2}>
  <Card title="Using the SDK" href="#using-the-sdk" icon="file-code">
    Collect and send GPU performance metrics directly from your application to an OpenTelemetry endpoint.
  </Card>
  <Card title="Using the Collector" href="#using-the-collector" icon="robot">
    Install the OpenTelemetry GPU Collector as a Docker container to collect and send GPU performance metrics to an OpenTelemetry endpoint.
  </Card>
</CardGroup>


<AccordionGroup>
  <Accordion title="Using the SDK">
    <Steps>
        <Step title="Install OpenLIT">
          Open your command line or terminal and run:
          ```shell
          pip install openlit
          ```
        </Step>
        <Step title="Initialize OpenLIT in your Application">
        You can set up OpenLIT in your application using either function arguments directly in your code or by using environment variables.
        <Tabs>
          <Tab title="Setup using function arguments">

          Add the following two lines to your application code:

          ```python
          import openlit

          openlit.init(
            otlp_endpoint="YOUR_OTEL_ENDPOINT", otlp_headers="YOUR_OTEL_HEADERS"
            collect_gpu_stats=True 
          )
          ```

          Replace:
          1. `YOUR_OTEL_ENDPOINT` with the URL of your OpenTelemetry backend, such as `http://127.0.0.1:4318` if you are using OpenLIT and a local OTel Collector.
          </Tab>
          <Tab title="Setup using Environment Variables">
          
          Add the following two lines to your application code:

          ```python
          import openlit

          openlit.init(collect_gpu_stats=True)
          ```

          Then, configure the your OTLP endpoint using environment variable:
          ```shell
          export OTEL_EXPORTER_OTLP_ENDPOINT = "YOUR_OTEL_ENDPOINT"
          export OTEL_EXPORTER_OTLP_HEADERS= "YOUR_OTEL_HEADERS"
          ```

          Replace:
          1. `YOUR_OTEL_ENDPOINT` with the URL of your OpenTelemetry backend, such as `http://127.0.0.1:4318` if you are using OpenLIT and a local OTel Collector. 
          </Tab>
        </Tabs>
        To send metrics to other Observability tools, refer to the [Connections Guide](/latest/connections/intro).

        For more advanced configurations and application use cases, visit the [OpenLIT Python repository](https://github.com/openlit/openlit/tree/main/sdk/python).
      </Step>
    </Steps>
  </Accordion>
  <Accordion title="Using the Collector">
  <Steps>
    <Step title="Pull `otel-gpu-collector` Docker Image">

        You can quickly start using the OTel GPU Collector by pulling the Docker image:

        ```sh
        docker pull ghcr.io/openlit/otel-gpu-collector:latest
        ```
     </Step>
     <Step title="Run `otel-gpu-collector` Docker container">

        You can quickly start using the OTel GPU Collector by pulling the Docker image:
        Here's a quick example showing how to run the container with the required environment variables:

        ```sh
        docker run --gpus all \
            -e GPU_APPLICATION_NAME='chatbot' \
            -e GPU_ENVIRONMENT='staging' \
            -e OTEL_EXPORTER_OTLP_ENDPOINT="YOUR_OTEL_ENDPOINT" \
            -e OTEL_EXPORTER_OTLP_HEADERS="YOUR_OTEL_HEADERS" \
            ghcr.io/openlit/otel-gpu-collector:latest
        ```

        For more advanced configurations of the collector, visit the [OTel GPU Collector repository](https://github.com/openlit/openlit/tree/main/otel-gpu-collector/).

        **Note:** If you've deployed **OpenLIT** using [Docker Compose](https://github.com/openlit/openlit/blob/main/docker-compose.yml), make sure to use the host's IP address or add OTel GPU Collector to the [Docker Compose](https://github.com/openlit/openlit/blob/main/docker-compose.yml):

        <AccordionGroup>
          <Accordion title="Docker Compose: Add the following config under `services`">

          ```yaml
          otel-gpu-collector:
            image: ghcr.io/openlit/otel-gpu-collector:latest
            environment:
              GPU_APPLICATION_NAME: 'chatbot'
              GPU_ENVIRONMENT: 'staging'
              OTEL_EXPORTER_OTLP_ENDPOINT: "http://otel-collector:4318"
            device_requests:
            - driver: nvidia
              count: all
              capabilities: [gpu]
            depends_on:
            - otel-collector
            restart: always
          ```

          </Accordion>
          <Accordion title="Host IP: Use the Host IP to connect to OTel Collector">

          ```sh
          OTEL_EXPORTER_OTLP_ENDPOINT="http://192.168.10.15:4318"
          ```

          </Accordion>
        </AccordionGroup>
        ### Environment Variables

        OTel GPU Collector supports several environment variables for configuration. Below is a table that describes each variable:

        | Environment Variable            | Description                                                   | Default Value           |
        |---------------------------------|---------------------------------------------------------------|-------------------------|
        | `GPU_APPLICATION_NAME`          | Name of the application running on the GPU                    | `default_app`           |
        | `GPU_ENVIRONMENT`               | Environment name (e.g., staging, production)                  | `production`            |
        | `OTEL_EXPORTER_OTLP_ENDPOINT`   | OpenTelemetry OTLP endpoint URL                               | (required)              |
        | `OTEL_EXPORTER_OTLP_HEADERS`    | Headers for authenticating with the OTLP endpoint             | Ignore if using OpenLIT |
      </Step>
    </Steps>
  </Accordion>
</AccordionGroup>

<Card title="Collected Metrics" href="/latest/features/metrics#gpu-metrics" icon='table'>
Details on the types of metrics collected and their descriptions.
</Card>

---

<CardGroup cols={2}>
<Card title="Connections" href="/latest/connections/intro" icon='link'>
Connect to your existing Observablity Stack
</Card>
<Card title="SDK configuration" href="/latest/sdk-configuration" icon='file-code'>
Documentation of the configuration options for the OpenLIT SDK.
</Card>
</CardGroup>